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Abstract 

We combine a high-order compact finite difference scheme to approximate spatial 
derivatives and collocation techniques for the time component to numerically solve the two 
dimensional heat equation. We use two approaches to implement the collocation methods. 

The first one is based on an explicit computation of the coefficients of polynomials and the 
second one relies on differential quadrature. We compare them by studying their merits 
and analyzing their numerical performance. All our computations, based on parallel 
algorithms, are carried out on the CRAY SV1. 
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1 Introduction 

In [11] Kouatchou presented an implicit technique to numerically solve the two-dimensional 
heat equation. The method, called the implicit collocation method (ICM), consists of first 
discretizing in space (using a fourth order compact scheme) the equation. The solution is 
approximated at each spatial grid point by a polynomial depending on time. The resulting 
derivation produces a linear system of equations. The order of the method is in space the 
order of the difference approximation and in time the degree of the polynomial [5, 8, 9, 10]. 
ICM when implemented on parallel computers, allows the parallelization across both time 
and space, i.e., at each iteration of the parallel algorithm, we can obtain the solution in the 
entire spatial domain at several consecutives time levels. Kouatchou proposed two parallel 
algorithms to compare ICM and the Crank-Nicolson method (known to be implicit) in [12]. 
His numerical experiments carried out on the SGI Origin 2000, showed that under some rea- 
sonable assumptions, ICM can be more cost effective than the Crank-Nicolson method. It 
was also observed in [13, 14] that higher-order algorithms in time (as ICM) can be made com- 
petitive with conventional time-marching algorithms, particularly if high accuracy is needed. 

* E-mail: kouatchou@gsfc.nasa.gov. This author is also affiliated with Morgan State University and Ins 
research was supported by NASA under the grant No. NAGS-3508. 

* E- m ai 1 : Fabienne .Jezequel@lip6.fr. 
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In [10, 11, 12], the authors used uniform time steps to derive the system of equations. 
In addition, regular polynomials were employed for the time discretization. It is an inter- 
esting problem to not only rely on non-uniform time steps but also to include orthogonal 
polynomials. 

In this paper, we reformulate the implicit collocation method for the two dimensional 
heat equation. We present two methods to approximate the time components. The first 
method (ICM-EP), described above, is based on an explicit determination of polynomial of 
degree t that approximates at each spatial grid point the time components at v consecutive 
time levels [11, 12]. The second method (ICM-DQ) uses instead differential quadratures 
to find the solution at r prescribed time levels. The underlying approach in the second 
method still (indirectly) relies on polynomials of degree r that may be more “stable”. In our 
new formulation we employ known results to show that the proposed methods have unique 
solutions, are implicit, are stable, and produce high accurate solutions. We perform basic 
comparisons of the two methods and discuss their merits. In our numerical experiments, we 
present the accuracy of the approximated solution and the parallel efficiency of each approach. 

An outline of the paper is as follows. Section 2 presents the techniques used to discretize 
the equation in space and the concept of collocation for the time variable. Section 3 discusses 
the implementation strategies. Numerical experiments appear in Section 4. We formulate 
some conclusions in Section 5. 

2 Derivation of the System of Equations 

We consider the two dimensional heat equation: 

du 2 f d 2 u \ d 2 u \ 

— (x,y,f) = a + — (x,y,f) 1 , (x, y, t) € SI x [0, oo) (1) 

where fi = (0, 1) x (0, 1), and with the initial condition 

u(x t y,0) = i){x,y), (x,y)eft, 

and with the assumption that u{x,y,t) is known for any ( x,y ) in the boundary and for 
t > 0. We also assume that u(x, y, t) is a smooth function on fi. 

The above equation models the flow of heat in the unit square domain that is insulated 
except on the boundary, a 2 is the thermal diffusivity. 

2.1 Spatial Discretization 

Let h = 1/n be the uniform spatial mesh-width. We can subdivide the spatial domain as 
follows: 


Xi = ih , y, = jh, i,j = 0, 1 , ... , n. 

For simplicity, we write the approximated solution of u and its time derivative at the spatial 
grid points (xi,yj) as: 


du . 


Cy W = u{Xi,yj,t.), and U'^t) = —(x u y j} t). 
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At any given time t, if we use the discretization of the steady state Poisson equation 
with a fourth-order scheme [7], we can approximate the spatial derivatives of (1). We obtain 
for any interior grid point (xj,yj): 


i [W‘) + °y+i W + + ^-.(0 + 80/j(‘)] 

_ sium... (A j-TL , . JA 4-TL i fill 


= 7?[4 (^i+i,j(^) + U t j+\ (t) + + Uij~\{t)) 

+Ui+i.i+i(t) + U t -ij+\(t) + Ui, ' ■ rr ■ ’ -• 




( 2 ) 


Eq. 2 is a system of h = (n - l) 2 ordinary differential equations and for any value of f, it is 
fourth-order in space. The system can be rewritten in a matrix-vector form as: 

MU'(t) = WU(ty+ c(t), (3) 


where M = M u /n-l]n-l, W = W u V^+ijn-! and c(t) is a n vector 

containing the values of u and its time partial derivative at the boundary grid points at time 
t. Mi = tri[l, 8, l]n-ii Wt - 1 = tr»[l, 4, !]„_!, Wj = [4, -20, 4] n _i , and W l+l = <n[l,4,l]„_i 
are matrices all of order n — 1. The initial condition C7(0) is given by the function ip(x,y)- 
Here 7 n _i is the identity matrix of order n— 1 and ai, ai + i] n -i denotes the tridiagonal 

matrix whose I th row contains the values ai - X , a t and o/ + i on its subdiagonal, diagonal and 
superdiagonal respectively. The subdiagonal of the first row and the superdiagonal of the 
last row are not defined. The subscript n - 1 determines the number of rows (or block-rows 
if the oj are block-matrices). 

Because M and W are constant nonsingular matrices and that the initial and boundary 
conditions are properly known, we can derive the following result: 

Lemma 1 The system (3) has a unique solution U(t). 

The solution U{t) is not only continuous in time but is also smooth (because of previous 
assumptions) , i.e., at least the first few derivatives with respect to time exit and are continuous 
too. We can now introduce the principle of the collocation method to evaluate the time 
derivative. 


2.2 Collocation Methods 

Without lost of generality, we suppose that we are interested in finding the solution for the 
values of t in the interval [0 ,t], where r is any positive number. Consider r arbitrary mesh 
points given by: 

0 = t 0 <t i < • • < f r -2 < t T —\ = t. (4) 

The r-stage collocation method is completely defined as a function of the above time mesh 
points by assuming that the approximated solution is reduced to polynomials of degree at 
most r. We then require that the approximated solution satisfies the initial and boundary 
conditions, and that it satisfies the differential equation (3) at the collocation points t 
k = 0, • ■ • , r — 1 . We propose two approaches. 


Method 1: Explicit Polynomials (ICM-EP) 

At any spatial grid point (a^y,), we assume that the approximated solution at t k (k = 
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0, . . . ,r - 1) can be written as a polynomial of degree r: 

Ui,j{tk) ~ P%,j(tk ) = a i,j,r^k + 1 + • • • + <l llh jtk + (lijfl. 

The polynomial expressions are substituted in (2) leading to a linear system of equations 
with rn unknowns, dj , • • • , <h,j,r> for b j — 1, . . . , n — 1. We note that the dij t o are known 
and are equal to u(xi,yj, 0) = ip(xi,yj). After some algebraic manipulations we obtain the 
linear system of rn equations [11, 12]: 

AX = 5, (5) 

where A is a block-tridiagonal matrix given by 

A = trilA^A^Ai^}^. 

A[- 1 , A[ and are square matrices of order r(n — 1) defined as 

1 h 2 

A,_ 7 = tri -E,--zE' -4E,-E 

2 or 

*- J n— 1 

1 h 2 h 2 1 h 2 1 

Ai = tri - — E'-4E,4^E' + 20E,-%E'-4E 

2 cr a 2 2 a 2 

L J n — 1 

1 h 2 

A t+ i = tri -E,--zE’ -4E,-E 
Z or 

L J n- 1 

E and E' are nonsymmetric dense matrices of order r. The vector X represents the rn 
unknowns a i,j, 2 > ■ ■ ■ , an d S is the right-hand-side containing values of the solution 
and its time derivative on the boundary of the domain. 

The bandwidth of the matrix A is equal to (2n + l)r and it has r 2 (3n - 5) 2 nonzero 
entries. After the determined, it is easy to compute the solution at any point t in 

the interval [to,<r-i]- More detailed information on this approach can be seen in [11, 12]. 

Method 2: Differential Quadrature (ICM-DQ) 

In the differential quadrature method, the value of the time derivative of the solution at 
each collocation point is expressed as weighted linear sums of the function values at all the 
sampling points, i.e., 

* 

r— 1 

U'ih) = TwuU(t<), k = 0,...,r-l. (6) 

1=0 

In particular, for any spatial grid point (aq, y,), 

r- 1 

UijiU) = Ew k ,iUij(ti). 

1=0 

The weighting coefficients w k ,i (fc, l = 0, . . . , r - 1) can be determined such that Eq. 6 is 
satisfied exactly for r linearly independent test polynomials (such as Lagrange, Legendre, 
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Chebyshev, Lobatto polynomials). The w k ,i are obtained solving a linear system of equations 
[15, 17]. 

With the knowledge of the weights, we can substitute equation (6) into the system (3) 
to obtain for a typical interior grid point at time level t k \ 


T— 1 

E 

1 = i 


r— 1 


E U kA U i+\j + U lj + 1 + U i~hj + U l. ?-l) “ E&^'d 


1=1 


+2p(f/, fc +lj+1 + UjL lJ+l + + U t k + i,_i) 

= ti*, 0 (E&ij + U?j+ 1 + Ul ld + Ufj_ x + 8 U}j) t 


where p = a 2 /h 2 , and 


Vk,l = 


8p - w*,*, if k = l 


£.k,i 


40p + 8 w kyk , if k = l 


-w k ,l, if k f l 

\ 

Eq. 7 leads to the following linear system of equations 

BY = R, 


8«'fc,i, 


if k 7^ Z 


( 7 ) 


( 8 ) 


where B is a block-tridiagonal matrix given by 

B = tri [Bi-i,Bi, J5( + i] n _ x . 

Bj_i, Bi and B;+i are square matrices of order (r — l)(n — 1) defined as 

i = tri[G,F,G] n - i, 

Bi — tri[F, D, F] n -i, 

B l+ 1 = tri[G,F,G] n -\. 

Here D, F and G are square matrices of order r - 1 which entries are given by D k j = -&,i, 
F k , = u Ki and G kyi = 2p4,; (where 6 k>l = 1 if k = f and 0 otherwise). T is the unknown vector 
representing the (r-l)n values Ufa (l’< i,j < n-1 and l <k < r-1) at the collocation point. 
The matrix has 2n(r — 1) as bandwidth and (r — l)((5r— l)(n — 3) 2 + ( 16 r — 8)(n — 3) + 12r — 8) 
nonzero entries. 

It is important to note that matrices A and B in (5) and (8) respectively, have the 
same block structure but B has a smaller order and fewer nonzero entries. Though both 
methods will give approximated solutions at the same collocation points, ICM-DQ requires 
less computations because Eq. 8 has fewer unknowns compared to Eq. 5. f 

Remark 1 ICM-EP requires the explicit knowledge of the time derivative at the boundary 
grid points. Ft is not the case for ICM-DQ where we can use the quadrature weights to evaluate 
the time derivative at any grid point even on the boundary. 

The above collocation schemes are equivalent to certain Runge-Kutta methods [2, 4, 
18, 19]. The two cases are A-stable methods [1, 16] and their accuracies are related to their 
growth functions [1, 5, 8, 9]. For ICM-EP, the accuracy in time is r regardless of the choice 
of the t k . In ICM-DQ, we deal with polynomials of degree r. The accuracy of the derivatives 
in the differential quadrature is r. Then the overall accuracy of the method is r independent 
of the selection of the t k and the test polynomials. 
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Lemma 2 In the implicit collocation method for the heat equation described here, if the 
approximated solution is obtained at r consecutive time levels, the resulting method is fourth 
order in space and (at least) of order r in time. 

One of the advantages of ICM-EP is that we do not obtain the approximated solution at the 
levels t k , k = 0,. . . ,r- 1 only but also at any time t in the interval [< 0 ,< r -i]- This is possible 
because of the explicit computation of the coefficients of the polynomials. For ICM-DQ, 
the solution is directly obtained at the collocation points t, k only. However, we can generate 
Lagrange polynomials with t. k as node points to find the solution at any points in [f. 0 ,/V-i] 
too. 

A question that may arise is whether r can be arbitrary large in order to obtain high 
accurate solution. For ICM-EP, Jezequel numerically showed the existence of an optimal 
degree r beyond which the accuracy of the approximated solution will not improve because of 
round-off error propagation [10]. This optimal r depends on the number a spatial grid points 
n. The method of differential quadrature, if the collocation points are properly chosen, offers 
more “stable” polynomials. Because of the stability of its polynomials, ICM-DQ may require 
larger optimal degree r. 

Remark 2 There are other collocation approaches to discretize Eq. 1. We can mention the 
spectral method where spatial derivatives are discretized using differential quadratures and the 
time derivative by finite differences [6]. Here, the computation of spatial derivatives is global 
in the sense that all the nodes (in a given direction) are applied in the calculation.- Compared 
to ICM, we may gain in accuracy but at the expense of memory requirement ( dense matrix 
and large bandwidth), an increase of computational cost , and an additional computational 
complexity (due to the unstructure shape of the matrix). In addition, this method may intro- 
duce a stability requirement associated with the time step whereas for ICM there is no such 
requirement.. 

3 Implementation 

In this section, we want to describe how ICM with explicit polynomials (ICM-EP) and ICM 
with differential quadrature (ICM-DQ) are implemented. In fact, the implementation stategy 
for both methods is based on one algorithm presented in [12]. 

Given the number n for spatial grid points, an arbitrary time step At and the parameter 
r, we want to determine the approximated solution in the time interval [0,T]. We carry out 
rj consecutive iterations of ICM with the assumption that T = r/(r — 1)A<. . 

Let us focus on the first iteration of ICM algorithm i.e. the solution is to be found in 
the interval [0, (r - l)At]- The results at the end of [0, (r - l)At] is used as initial conditions 
for the next time interval and the length of the interval for each subsequent iteration is still 
(r — 1)A t. Both end points of the interval are in the set of the collocation points. For 
ICM-EP, the r collocation points can be arbitrarily chosen in [0, (r - l)Af], The spatial and 
time discretizations give rise to a linear system of equations, Eq. 5, which solution is the 
coefficients of the polynomials. The approximated solution is then evaluated at any point in 
the interval [0, (r — 1 ) At]. 

In ICM-DQ, the collocation points are either taken arbitrary or are the r zeros (mapped 
into the interval [0, (r - 1)A/,]) of a known polynomial of degree r. After the weights w ki i 
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are determined, we also solve a linear system of equations, Eq. 8, which directly gives the 
approximated solution at the collocation points. 

For the computation of the weighting coefficients we assume that the test polyno- 
mials (of degree at most r - 1) under consideration, are defined in the interval [—1, 1] and 
therefore the r zeros belong to the same interval. We first calculate the weights by focusing 
on the interval [—1, 1] and the corresponding weights w^,i in [0, (r — l)Af] are just the product 
of the previous ones by a unique constant that is function of the the smallest and largest zeros 
in [-1, 1] and the two end points in [0, (r - l)At]. These weights are obtained by solving r 
linear system of equations with r unknowns each. All the systems have the same underlying 
matrix and the only change is the right-hand-side. 

Remark 3 There are more explicit formulas to determine the weighting coefficients [6]. They 
depend on the choice of the polynomials. For the sake of using a unique algorithm to find the 
weights for all the cases presented here, we use linear systems of equations. To avoid any ill 
conditioned problems related to those systems, we only consider small values of r ( less than 
10). Our numerical experiments suggest that the use of larger values of r do not improve the 
overall accuracy of ICM. 

The implementations of ICM-EP and ICM-DQ are presented in Algorithm 1 and Algo- 
rithm 2 respectively. 

Algorithm 1 

1. Define the matrix A 

2. Decompose the matrix A to obtain the matrix A & A~ l 

3. For l — 1 — > r), do: 

4. Define the right hand side S 

5. Determine the coefficients AAX = AS 1 

6. Find Uij(tk) at (r — 1) collocation points by computing Pij{tk) 

7. End do 

Algorithm 2 

1. Compute the weights Wk,i 

2. Define the matrix B 

3. Decompose the matrix B to obtain the matrix B « B 

4. For l — 1 — > rj, do: 

5. » Define the right hand side R 

6. Find Uij(tk) at (r — 1) collocation points: BBY - 

7. End do 

In ICM-EP, the order of the matrix is rn whereas for ICM-DQ it is (r — l)n. The systems 
are solved by first preconditioning the matrices and then using the general minimal residual 
technique as the iterative accelerator [20]. There is no major difference between Algorithm 1 
and Algorithm 2. Algorithm 2 differs from Algorithm 1 in two basic points: 1- It ends when 
the solution of the linear system is found (Step 6) and, 2- In its initialization phase we need 
to evaluate the weights U)k,i (Step 1) which computational cost is negligible with respect to 
the one of the entire algorithm. 


-l 


I 
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As n increases, most of the time in the two algorithms is devoted to finding the solution 
of the linear systems. Since ICM-DQ has fewer unknowns and non-zero entries in its matrix- 
equation, it uses less memory and will require less CPU time. 

4 Numerical Experiments 

Tn this section, we compare ICM-EP and ICM-DQ when they are both implemented on a 
shared memory vector computer, namely the CRAY SVl (24 processors each running at 300 
Mhz and 8 GB memory). The programs were coded in Fortran 90 programming language in 
double precision with 04-bit arithmetic. The parallel implementation of the two algorithms 
was achieved by introducing OpenMP directives in the code. 

For all our simulations, we consider Eq. 1 with the exact solution given by 

t i r 

u{x,y,t) = e _ s cos —(x + y) + e~ 2t smir(x - y). 

and a = l/vr. Initial conditions and boundary conditions at any given time are taken from 
the above exact solution. 

In all our numerical experiments, we assume that the target T where we want to get the 
approximated solution is equal to 10.0. The two algorithms are iterated until T is reached. 
We focus on the accuracy of the two methods and the computing time they require. 

We first determine the elapsed times obtained with the two methods. It is important 
to note that for a given set the parameters (n and r), the computational cost of ICM-DQ 
remains the same regardless of the choice of the collocation points and the test polynomials. 
We consider r — 4, n = 16, and A t = 10 2 and record in Table 1 the computing times (in 
seconds) as function of the number of processors (#CPUs). We note that in both cases, 
as the number of processors increases (from 1 to 8), the computing time decreases. For a 
given number of processors, ICM-DQ requires the smallest time. This result was predicted 
in the previous section. In addition, ICM-DQ displays a slightly better speedup. It is worth 
mentioning that the Cray SVl is not a dedicated computer, i.e., the requested processors are 
not reserved to a particular user during his run. The elapsed time measures the interval of 
time between the beginning of the run and its end. It includes any possible idle time. 


#CPUs 

ICM-EP 

ICM-DQ 

1 

236.8 

145.9 

2 

102.2 

62.41 

4 

69.84 

41.48 

8 

56.51 

33.61 


Table 1: Elapsed time (in seconds) as function of the number of processors when n = 16 and 
r = 4. 


Now we want to test the accuracy of ICM-EP and ICM-DQ. We initially compute the 
maximum error (at T = 10.0) obtained with both methods when At = 10 -2 , r = 4 and n 
varies. For ICM-DQ, we use as test functions regular polynomials (the set {1, t , . . . f r_1 }), 
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the Lobatto, Legendre and Cliebyshev polynomials. The results are presented in Table 2. 
We observe that in all the ■causes, the maximum errors-decrease by a factor of 16 when n is 
doubled ."'Thus TOM' Solut ions 'deTnonstrate foiYrth-order convergence in space and all of the 
methods offer comparable accuracies. 



ICM-EP 

ICM-DQ 

n 


Regular 

Lobatto 

Legendre 

Chebyshev 

4 

4.13(08) 

4.17(-08) 

4. 18 (-08) 

4. 07 (-08) 

4.11 (-08) 

8 

2.74(-09) 

2.61(-09) 

2.62(-09) 

2.55(-09) 

2.58 (-09) 

16. 

1,62 (-10) 

1.54(-10) 

. 1 .58 (-10) 

1.53 (-10) 

1.54 (-10) 

32 

1.00(-12) 

1.24 (-12)' 

4.92(12) 

3.51 (-12) 

4. 59 (-12) 


Table 2: Maximum error obtained with ICM-EP and ICM-DQ when At — 10 2 , r — 4 and 
n varies. 


We now study how the algorithms behave when n and r are fixed and At varies. It is 
important to mention that At does not necessarily measure the time step of each method. 
The actual time step A t a for a given method is obtained by A t a = maxitfc+x — t where 
the tfc belong to the time interval under consideration, say [0, (r - l)At]. For ICM-EP and 
ICM-DQ with regular test polynomials, we took Af a = At. For the remaining cases, At a is 
larger than At and ICM-DQ with Lobatto polynomials has the largest value. 



ICM-EP 

ICM-DQ 

At 


Regular 

Lobatto 

Legendre 

Chebyshev 

16 x 10~ z 
8 x 10~ 2 

4 x 10~ 2 
2 x 10 -2 
1 x 10“ 2 

5 x 10 -3 

7.75(-10) 
1.74(-10) 
1.63(-10) 
1.61 (-10) 
1.62(-10) 
1.61 (-10) 

1.37(-08) 
5.39(-09) 
4.86(-10) 
9.05 (-11) 

1.54(-10) 

1.60(-10) 

6. 44 (-09) 
3.75(-09) 
2.61(-10) 
1.21(-10) 
1.58(-10) 
1 .61 (-10) 

9.39 (-09) 
2.22(-09) 
3.74(-10) 
1.06(-10) 
1.53(-10) 
1.60(-10) 

8.13(-09) 

1.93(-09) 

3.34(-10) 

1.11(-10) 

1.54(-10) 

1.61(-10) 


Table 3: For n = 16 and r = 4, maximum error as function of At obtained with ICM-EP and 
ICM-DQ. 


In Table 3, we record the maximum error as function of At when n = 16 and r = 4. 
As At decreases, the errors seem to level off to unique asymptotic value. ICM-DQ requires 
smaller At to reach that asymptotic value. When At is crude, ICM-DQ (as ICM-EP) needs 
few iterations to get to the target time T = 10.0 but needs more to adjust. We observe that 
if we allow the iterations to go well beyond T = 10.0, ICM-EP and TCM-DQ will give similar 
accuracies independent of At and r. 

For the problem studied in this paper, the use of non-uniform time grids (such as zeros 
of Lobatto, Legendre, Chebyshev) does neither deteriorate nor enhance the accuracy of the 
quadrature solutions compared to the use of equally spaced sampling grid points. Tn other 




10 


types of problems; the utilization of non-uniform grids has resulted in better accuracies [3j: 

5 Conclusions 

We compared two implicit methods to numerically solve the two dimensional heat equation. 
The two methods differ by the way that the time derivative is computed. One uses an explicit 
computation of coefficients of polynomials to approximate the time component and the other 
relies on differential quadratures. In the two cases, the spatial derivatives are discretized 
using a fourth-order finite difference scheme. Our numerical results obtained on the Cray 
SV1 showed that both methods produce high accurate solutions and that the one based on 
differential quadrature requires less memory and the smallest elapsed time. 


Acknowledgment: The authors are grateful to the NASA Center for Computational Sci- 
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source to perform this work. 
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